Goto

Collaborating Authors

 weighted average


Synthetic Design: An Optimization Approach to Experimental Design with Synthetic Controls

Neural Information Processing Systems

We investigate the optimal design of experimental studies that have pre-treatment outcome data available. The average treatment effect is estimated as the difference between the weighted average outcomes of the treated and control units. A number of commonly used approaches fit this formulation, including the difference-inmeans estimator and a variety of synthetic-control techniques. We propose several methods for choosing the set of treated units in conjunction with the weights. Observing the NP-hardness of the problem, we introduce a mixed-integer programming formulation which selects both the treatment and control sets and unit weightings. We prove that these proposed approaches lead to qualitatively different experimental units being selected for treatment. We use simulations based on publicly available data from the USBureau of Labor Statistics that show improvements in terms of mean squared error and statistical power when compared to simple and commonly used alternatives such as randomized trials.


Adaptive Averaging in Accelerated Descent Dynamics

Neural Information Processing Systems

We study accelerated descent dynamics for constrained convex optimization. This dynamics can be described naturally as a coupling of a dual variable accumulating gradients at a given rate ฮท(t), and a primal variable obtained as the weighted average of the mirrored dual trajectory, with weights w(t). Using a Lyapunov argument, we give sufficient conditions on ฮท and wto achieve a desired convergence rate. As an example, we show that the replicator dynamics (an example of mirror descent on the simplex) can be accelerated using a simple averaging scheme. We then propose an adaptive averaging heuristic which adaptively computes the weights to speed up the decrease of the Lyapunov function. We provide guarantees on adaptive averaging in continuous-time, prove that it preserves the quadratic convergence rate of accelerated first-order methods in discrete-time, and give numerical experiments to compare it with existing heuristics, such as adaptive restarting. The experiments indicate that adaptive averaging performs at least as well as adaptive restarting, with significant improvements in some cases.





Asymptotic normality and confidence intervals for derivatives of 2-layers neural network in the random features model

Neural Information Processing Systems

This paper studies two-layers Neural Networks (NN), where the first layer contains random weights, and the second layer is trained using Ridge regularization. This model has been the focus of numerous recent works, showing that despite its simplicity, it captures some of the empirically observed behaviors of NN in the overparametrized regime, such as the double-descent curve where the generalization error decreases as the number of weights increases to $+\infty$. This paper establishes asymptotic distribution results for this 2-layers NN model in the regime where the ratios $\frac p n$ and $\frac d n$ have finite limits, where $n$ is the sample size, $p$ the ambient dimension and $d$ is the width of the first layer. We show that a weighted average of the derivatives of the trained NN at the observed data is asymptotically normal, in a setting with Lipschitz activation functions in a linear regression response with Gaussian features under possibly non-linear perturbations. We then leverage this asymptotic normality result to construct confidence intervals (CIs) for single components of the unknown regression vector. The novelty of our results are threefold: (1) Despite the nonlinearity induced by the activation function, we characterize the asymptotic distribution of a weighted average of the gradients of the network after training; (2) It provides the first frequentist uncertainty quantification guarantees, in the form of valid ($1\text{-}\alpha$)-CIs, based on NN estimates; (3) It shows that the double-descent phenomenon occurs in terms of the length of the CIs, with the length increasing and then decreasing as $\frac d n\nearrow +\infty$ for certain fixed values of $\frac p n$. We also provide a toolbox to predict the length of CIs numerically, which lets us compare activation functions and other parameters in terms of CI length.


Goal inference with Rao-Blackwellized Particle Filters

arXiv.org Artificial Intelligence

Inferring the eventual goal of a mobile agent from noisy observations of its trajectory is a fundamental estimation problem. We initiate the study of such intent inference using a variant of a Rao-Blackwellized Particle Filter (RBPF), subject to the assumption that the agent's intent manifests through closed-loop behavior with a state-of-the-art provable practical stability property. Leveraging the assumed closed-form agent dynamics, the RBPF analytically marginalizes the linear-Gaussian substructure and updates particle weights only, improving sample efficiency over a standard particle filter. Two difference estimators are introduced: a Gaussian mixture model using the RBPF weights and a reduced version confining the mixture to the effective sample. We quantify how well the adversary can recover the agent's intent using information-theoretic leakage metrics and provide computable lower bounds on the Kullback-Leibler (KL) divergence between the true intent distribution and RBPF estimates via Gaussian-mixture KL bounds. We also provide a bound on the difference in performance between the two estimators, highlighting the fact that the reduced estimator performs almost as well as the complete one. Experiments illustrate fast and accurate intent recovery for compliant agents, motivating future work on designing intent-obfuscating controllers.


Exploring Vulnerability in AI Industry

arXiv.org Artificial Intelligence

The rapid ascent of Foundation Models (FMs), enabled by the Transformer architecture, drives the current AI ecosystem. Characterized by large-scale training and downstream adaptability, FMs (as GPT family) have achieved massive public adoption, fueling a turbulent market shaped by platform economics and intense investment. Assessing the vulnerability of this fast-evolving industry is critical yet challenging due to data limitations. This paper proposes a synthetic AI Vulnerability Index (AIVI) focusing on the upstream value chain for FM production, prioritizing publicly available data. We model FM output as a function of five inputs: Compute, Data, Talent, Capital, and Energy, hypothesizing that supply vulnerability in any input threatens the industry. Key vulnerabilities include compute concentration, data scarcity and legal risks, talent bottlenecks, capital intensity and strategic dependencies, as well as escalating energy demands. Acknowledging imperfect input substitutability, we propose a weighted geometrical average of aggregate subindexes, normalized using theoretical or empirical benchmarks. Despite limitations and room for improvement, this preliminary index aims to quantify systemic risks in AI's core production engine, and implicitly shed a light on the risks for downstream value chain.


MergeMoE: Efficient Compression of MoE Models via Expert Output Merging

arXiv.org Artificial Intelligence

The Mixture-of-Experts (MoE) technique has proven to be a promising solution to efficiently scale the model size, which has been widely applied in recent LLM advancements. However, the substantial memory overhead of MoE models has made their compression an important research direction. In this work, we provide a theoretical analysis of expert merging, a recently proposed technique for compressing MoE models. Rather than interpreting expert merging from the conventional perspective of parameter aggregation, we approach it from the perspective of merging experts' outputs. Our key insight is that the merging process can be interpreted as inserting additional matrices into the forward computation, which naturally leads to an optimization formulation. Building on this analysis, we introduce MergeMoE, a method that leverages mathematical optimization to construct the compression matrices. We evaluate MergeMoE on multiple MoE models and show that our algorithm consistently outperforms the baselines with the same compression ratios.